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ABSTRACT:  The  benefits  of  dynamic  testing  are  thought  to  include;  (a)  a  reduction 
in  strategic  variance,  accompanied  by  (b)  a  test  score  increase  for  "disadvantaged" 
subjects.  Sometimes  forgotten,  however,  is  that  these  accomplishments  are  illusory 
unless  they  support  a  specified  goal  (e.g.,  better  validity).  In  the  present  study,  we 
examine  the  benefits  of  dynamic  test  administration  with  the  Raven’s  Advanced 
Progressive  Matrices  (APM)  test  of  general  intelligence.  The  results  indicate  that, 
while  APM  scores  were  significantly  increased  by  dynamic  procedures,  important 
criteria  such  as  reliability  and  construct  validity  were  not  enhanced.  We  conclude 
that  the  choice  of  dynamic  procedures  depends  on  both  the  ability  construct  and 
the  purpose  of  testing,  and  should  be  justified  on  a  case-by-case  basis. 
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In  recent  years  there  has  been  growing  interest  in  alternative  testing  procedures 
such  as  "dynamic  assessment"  (Feuerstein  1979)  and  "testing-the-limits"  (Carlson 
&  Weidl  1979),  partly  because  of  suggestions  that  these  procedures  might  be  superior 
to  conventional  psychometric  test  methods.  For  example,  one  problem  with  conven¬ 
tional  testing  is  that  identical  test  scores  can  mean  different  things  for  different 
people,  as  in  the  following  illustration.  Suppose  that  two  individuals  are  taking  a 
test  of  spatial  ability.  If  Person  A  solves  the  items  through  transformations  of  mental 
images,  while  Person  B  uses  an  entirely  verbal  strategy,  then  their  test  scores  do 
not  reflect  the  same  underlying  construct.  Hence,  the  construct  validity  of  the  test 
is  impeached,  and  predictive  validity  mav  suffer  as  well.  Dynamic  testing  may  solve 
this  problem  because  its  procedures  (including  directed  practice)  can  encourage 
subjects  to  exhibit  the  knowledge  and  skills  that  the  test  was  designed  to  measure, 
and  to  abandon  irrelevant  strategies.  The  expected  result  would  be  a  more  construct- 
valid  test.  This,  indeed,  was  the  result  of  a  study  by  Embretson  (1987),  who  compared 
the  performance  of  subtects  tested  on  the  figure-folding  task  from  the  Differential 
Aptitude  Test  (DAT),  under  either  dynamic  or  control  conditions.  Embretson  s 


Oirect  ill  correspondence  to:  Gerald  £.  l arson  Testing  Systems  Department  Navy  Personnel  Researcn  ana  Development  Center  San  C.ego. 

CA  92152 _ _ _ _ _ _ 

Learning  and  Individual  Differences.  Volume  3.  Numoer  2.  1991  pages  1 23-134  Cooyngni  -  1991  Py  JAI  Press,  me 

All  nghts  ol  reproduction  m  any  ‘orm  reserved _ <SSN  '041-6060 


124 


LEARNING  AND  INDIVIDUAL  DIFFERENCES 


VOLUME  1  NUMBER  2. 1991 


results  show  that  dynamic  testing  (involving  cues  and  solution  modeling)  led  to 
improvements  in  both  construct  validity  and  predictive  validity. 

Hypothetically,  a  reduction  in  inter-individual  strategy  variance  (through  dynamic 
procedures)  might  also  produce  scores  that  are  fairer  to  "disadvantaged"  subjects, 
including  examinees  with  little  test-taking  experience  and/or  test  sophistication. 
Through  directed  practice,  such  subjects  can  be  encouraged  to  abandon  self- 
defeating  strategies  and  thereby  reveal  their  true  competence.  Those  who  believe 
that  dynamic  procedures  enhance  test  fairness  might  point  to  studies  such  as  Dillon 
and  Carlson  (1978),  who  found  that  ethnic  group  differences  on  reasoning  tasks 
were  narrower  in  a  dynamic  condition  than  in  a  control  condition. 

What  if  inter-individual  strategies  are  important?  Partly  because  of  the  perceived 
benefits  mentioned  above,  the  dynamic  testing  literature  has  thus  far  been  almost 
uniformly  positive.  Perhaps  it  is  time,  therefore,  to  note  that  dynamic  testing  may 
not  be  a  panacea,  and  that  it  may  in  fact  be  theoretically  inappropriate  for  tests  of 
general  intelligence  (or  g),  such  as  Raven's  Advanced  Progressive  Matrices  (APM) 
Test.1  The  general  intelligence  dimension  is  problematic  because  it  may  be  associated 
with  inter- individual  strategic  variation.  Therefore,  it  is  conceivable  that  strategy 
"standardization"  through  dynamic  testing  procedures  may  simultaneously  dimin¬ 
ish  the  usefulness  of  a  test  such  as  the  APM,  if  strategies  are  themselves  indices  of 
g.  And,  in  fact,  there  is  evidence  with  which  to  pursue  such  an  argument. 

Haygood  and  Johnson  (1983),  for  example,  provide  an  interesting  example  of  the 
greater  strategic  flexibility  of  "high-y"  subjects  given  novel  tasks.  Haygood  and 
Johnson  employed  the  Sternberg  (1966)  memory-search  task,  in  which  subjects  are 
asked  to  memorize  a  set  of  single  digits  (0-9)  called  the  positive  set.  Next,  subjects 
are  asked  whether  the  positive  set  includes  a  series  of  individually  presented  test 
digits.  Performance  is  measured  via  reaction  time  (RT)  to  test  items.  A  seldom 
emphasized  aspect  of  the  task  is  that  as  more  digits  are  added  to  the  positive  set, 
fewer  remain  in  the  out-group  or  negative  set.  As  the  ratio  shifts,  the  advantage 
of  switching  focus  to  the  negative  set  increases  because  there  are  relatively  fewer 
digits  to  work  with.  For  example,  a  subject  can  verify  that  an  item  is  a  member  of 
a  small  negative  set  faster  than  he  or  she  can  determine  that  it  is  a  member  of  a 
much  larger  positive  set — yet  both  methods  can  produce  the  correct  answer.  Of 
interest  is  Haygood  and  Johnson's  finding  that  subjects  who  scored  high  on  Raven's 
Progressive  Matrices  were  also  quicker  to  shift  to  a  negative  set  focus  and  thereby 
take  advantage  of  the  difference  in  set  sizes. 

A  similiar  finding  is  reported  in  a  study  by  Ippel  and  Beem  (1987),  who  found  a 
correlation  between  Raven's  Matrices  and  the  point  at  which  subjects  shifted  from 
a  clockwise  to  a  counterclockwise  direction  on  a  mental  rotation  task.  The  shift 
pattern  of  subjects  scoring  high  on  the  Raven  was  more  "rational,"  in  that  they 
tended  to  rotate  objects  in  the  shortest  direction.  The  use  of  dynamic  testing  proce¬ 
dures  to  reduce  strategic  variation  might  therefore  work  against  the  validity  of 
y-measures  such  as  Raven's  Advanced  Progressive  Matrices  (APM),  if  tests  like  the 
APM  measure  intelligence  because  thev  are  strategy-ambiguous  and  thereby  require 
flexibility'  and  invention  (see  discussion  bv  Kirbv  and  Lawson,  1983). 

We  raise  these  issues  not  because  of  general  misgivings  about  dynamic  testing. 
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Rather,  we  are  concerned  that  all  procedures  have  limits,  yet  the  limits  of  dynamic 
testing  have  thus  far  not  been  addressed.  This  hinders  an  informed  decision  by  the 
testing  professional  who  wishes  to  tailor  his/her  method  to  a  specific  situation. 


PURPOSE 

In  the  present  experiment  we  explore  the  effect  of  dynamic  testing  procedures  on 
the  construct  validity  of  the  APM  test.  The  dynamic  testing  package  itself  was 
designed  to  discourage  various  counter  productive  test-taking  strategies  which  are 
known  to  be  used  by  some  low  scoring  subjects.  If  dynamic  testing  is  the  best  way 
to  measure  general  intelligence,  then  our  interventions  should  produce  a  more  valid 
APM  score. 


METHOD 


SUBJECTS 

Subjects  were  808  male  Navy  recruits  (mean  age  19.5  years,  SD  =  2.36  years)  selected 
at  random  from  groups  undergoing  in-processing  at  the  Recruit  Training  Command, 
San  Diego. 

PROCEDURE 

1.  Armed  Services  Vocational  Aptitude  Battery  (ASVAB)  scores  were  gathered 
from  the  recruits'  personnel  records.  The  ASVAB  is  a  set  of  ten  tests  (listed 
in  Table  1)  used  for  selection  and  classification  of  military  applicants.  The 
tests  are  scaled  to  a  mean  of  50  and  a  standard  deviation  of  10  in  an 
unselected,  nationally  representative  sample. 

2.  Raven  Progressive  Matrices.  Subjects  were  group-tested  on  the  36  item 
Raven  Progressive  Matrices  (APM),  Advanced  Set  II  (Raven  1962),  under 
either  "standard"  (N  =  413)  or  dynamic  testing  conditions  (N  =  395)  as 
described  below.:One  group  of  about  40  subjects  was  tested  per  day.  Group 
assignment  was  random.  In  both  the  dynamic  testing  and  control  condi¬ 
tions,  a  time  limit  was  set  for  solving  each  item.  Time  limits  were  necessary' 
because  correct  answers  were  presented  as  part  of  dynamic  testing,  making 
it  necessary'  to  have  subjects  respond  before  hearing  the  presentation. 
Proctors  insured  that  all  subjects  responded  when  told  to  do  so. 


The  following  item  time  limits  were  used,  based  on  several  days  of  pilot  testing 
during  which  proctors  determined  the  typical  item  solution  latencies. 
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Items  1  to  12  :  30  seconds  per  item 
Items  13  to  20  :  1.0  minute  per  item 
Items  21  to  26  :  2.0  minutes  per  item 
Items  27  to  36  :  2.5  minutes  per  item 

The  total  item  solution  time  was  thus  51  minutes,  which  is  about  25%  longer  than 
the  40  minute  test-time  limit  suggested  in  the  Raven  manual. 

Condition  1 :  Standard  Procedures.  In  this  condition,  subjects  were  first  given  a  four-item 
practice  booklet  with  the  following  printed  instructions: 

You  will  be  shown  a  series  of  items  in  which  part  of  a  pattern  is  missing.  You  must 
pick  the  missing  part  from  8  possible  choices.  Be  sure  and  look  both  down  and  across 
the  pattern  before  making  your  choice.  PLEASE  CIRCLE  THE  CORRECT  ANSWER. 

The  practice  items  were  relatively  easy  problems  taken  from  the  APM,  Set  I.  The 
test  administrator  read  the  instructions  aloud,  and  then  subjects  were  given  as 
much  time  as  needed  to  complete  the  booklet.  When  all  subjects  had  finished  the 
proctor  gave  out  the  correct  answers,  along  with  a  very  brief  explanation  of  whv 
each  answer  was  correct.  Subjects  were  allowed  to  request  clarification  at  anv  time 
during  these  explanations. 

Next,  subjects  were  given  the  36  item  APM  Set  II  test,  with  the  time  limits  for 
items  noted  above.  At  the  end  of  the  time  limit  for  each  item,  subjects  entered  their 
response  on  the  numeric  keypad  of  a  Hewlett-Packard  Integral  Personal  Computer. 


TABLE  1 

Tests  in  ASVAB  Forms  11,12,  and  13 


Tests 

Abbreviation 

Description 

General  Seienee 

GS 

A  25-item  test  of  knowledge  of  the  physical  ( 1 3  itemsi  and 
biological  1 1 2  items)  sciences — 1 1  minutes 

Aruhmetie  Reasoning 

AR 

A  30-item  test  of  ability  to  solve  arithemtie  word  problems 
— 3b  minutes 

Word  Knowledge 

WK 

A  35 -item  test  of  knowledge  of  vocabulary,  using  words 
embedded  in  sentences  ( 1 1  itcms)andsynonyms(24  itemsi 
—  1 1  minutes 

Paragraph  Comprehension 

PC 

A  1 5-item  test  of  reading  comprehension —  1 3  minutes 

Numerical  Operations 

NO 

A  50-iicm  speeded  test  of  ability  to  add.  subtract,  multiply,  and 
divide  one-  and  two-digit  numbers — 3  minutes 

Coding  Speed 

CS 

An  S4-nem  speeded  test  of  ability  to  recognize  numbers 
associated  w  ith  words  from  a  table — 7  minutes 

Auio  and  Shop  Intormation 

AS 

A  25-item  test  of  knowledge  of  automobiles,  shop  practices,  and 
use  of  tools —  1 1  minutes 

Mathematics  Know  ledge 

MK 

A  25-item  test  of  knowledge  of  algebra,  geometry,  fractions, 
decimals,  and  exponents — 24  minutes 

Mechanical  Comprehension 

MC 

A  25-uem  test  ot  knowledge  of  mechanical  and  physical 
principles —  1  d  minutes 

Electronics  Intormation 

El 

A  20-item  test  of  knowledge  of  electronics,  radio,  and  electrical 
principles  and  intormation — d  minutes 
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Condition  2:  Dynamic  Testing.  The  purpose  of  the  APM  dynamic  testing  package  was 
to  reduce  self-defeating  and/or  irrelevant  APM  strategies,  by  using  performance¬ 
enhancing  methods  from  previous  studies.  Two  of  these  self-defeating  strategies 
deserve  special  mention.  They  are  (a)  gestalt  (or  imagery-based)  algorithms  and  (b) 
abbreviated  encoding.  Regarding  the  former,  Hunt  (1974)  suggested  that  at  least 
two  algorithms  could  be  used  to  solve  APM  items.  One,  called  the  gestalt  algorithm, 
emphasizes  the  operations  of  visual  perception,  such  as  the  continuation  of  lines 
through  blank  areas  and  the  superimposition  of  visual  images  upon  each  other.  For 
example,  one  might  imagine  the  existing  lines  in  the  matrix  stretching  into  the 
missing  cell,  without  thinking  abstractly  about  the  properties  of  the  matrix.  The 
second  algorithm,  called  the  analytic  algorithm,  breaks  the  matrix  elements  down 
into  features,  then  employs  logical  operations  to  determine  which  features  and 
relationships  are  critical.  Various  analyses  suggest  that  the  gestalt  algorithm  is 
inferior  because  it  cannot  be  used  to  solve  difficult  items  (e.g..  Hunt  1974;  Kirby  & 
Lawson  1983;  Lawson  &  Kirby  1981).  Therefore,  our  dynamic  procedures  dis¬ 
couraged  its  use. 

The  second  self-defeating  strategy  (or  tendency)  we  attempted  to  discourage 
involves  abbreviated  problem  encoding,  sometimes  associated  with  impulsivitv. 
Lawrv,  Welsh,  and  Jeffrey  (1983),  for  example,  categorized  children  as  reflective  or 
impulsive  (using  Kagan's  Matching  Familiar  Figures  test)  and  then  studied  the 
performance  of  these  two  groups  on  matrix  items.  Lawry  et  al.  found  that  as  the 
items  became  more  difficult,  reflectives  slowed  their  performance  more  than  did 
impulsives.  Moreover,  slowed  responding  was  associated  with  higher  scores  on 
the  more  difficult  items,  a  finding  later  replicated  by  Welsh  (1987).  Correction  of 
this  tendency'  is  therefore  a  proper  goal  of  dynamic  testing,  particularly  since  impul- 
sivitv  is  thought  to  disproportionately  impair  the  performance  of  lower  socio¬ 
economic  class  subjects  (see  Turner,  Hall,  &  Grimmett  1973).1 2 3 

To  summarize  the  rationale  behind  the  design  of  the  dynamic  testing  package, 
we  sought  to  encourage  subjects  to  use  optimal  analytic  strategies.  Moreover,  pro¬ 
ponents  of  dynamic  testing  might  argue  that  our  interventions  should  improve 
validity.  The  specific  dynamic  procedures  are  described  below. 


Part  1:  Analytic  Reflective  Instructions.  Part  1  involved  a  test  booklet  (12  items,  12 
minutes)  designed  to  encourage  subjects  to  (a)  adopt  an  analytic  approach  to  infer¬ 
ring  the  rule  underlying  the  matrix  pattern,  and  to  (b)  generate  and  sketch  their 
own  item  solutions  before  looking  at  the  answer  choices.  The  items  were  all  from 
the  APM  Advanced  Set  l. 

The  cover  page  of  the  booklet  offered  the  following  advice: 

1.  Remember  that  all  problems  can  be  broken  down  into  smaller  steps.  The  steps 
can  be  worked  on  one  at  a  time. 

2.  Make  sure  that  vour  steps  follow  a  logical  order  toward  solving  the  problem 

3.  Work  careruilv1  Manv  mistakes  are  made  iust  because  people  are  in  too  much 
ot  a  hurry. 

4  If  it  pavs  to  start  over,  then  start  over.  Don  t  stick  with  something  that  doesn  t 
,eem  to  be  working 
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Each  item  in  the  booklet  spanned  3  pages.  On  the  first  page,  the  matrix  was  pre¬ 
sented,  along  with  a  space  for  drawing  a  picture  that  would  complete  the  pattern. 
The  printed  instructions  asked  the  subjects  to  (1)  analyze  the  changes  in  the  matrix 
pattern  from  left-to-right  and  top-to-bottom,  (2)  figure  out  the  rule  that  explains 
the  changes,  (3)  use  the  rule  to  draw  an  answer,  and  (4)  go  on  to  the  second  page. 
On  page  2  of  the  problem,  subjects  were  again  shown  the  matrix  problem,  but  this 
time  the  answer  choices  were  displayed  along  with  it.  The  printed  instructions 
asked  the  subjects  to  find  and  circle  the  answer  they  had  drawn  on  page  1.  If  their 
answer  was  not  among  the  choices,  subjects  were  instructed  to  (1)  analyze  the 
puzzle  again,  (2)  figure  out  the  rule  that  explains  the  changes  from  left-to-right  and 
top-to-bottom,  (3)  use  the  rule  to  choose  an  answer.  They  were  instructed  not  to 
go  back  and  change  their  drawing,  but  rather  to  go  to  page  3  where  the  correct 
answer  was  shown.  On  page  3,  the  matrix  and  the  answer  choices  were  again 
shown,  with  the  correct  answer  circled.  The  design  of  the  PART  1  booklet  was 
influenced  by  analytic  training  methods  reported  in  Kirby  &  Lawson  (1983);  Lawson 
&  Kirby  (1981);  Malloy,  Mitchell  &  Gordon  (1987);  Sternberg  (1986). 


Part  2:  Rule  Combination  Principle.  In  part  2  of  the  dynamic  testing  session,  the  proctor 
presented  examples  of  12  simple  item  progression  (or  relation)  rules  that  are  common 
in  figural  analogy  problems  (e.g.,  "change  in  size,"  "change  in  shape").  The  rules 
themselves,  which  are  shown  in  Table  2,  are  adapted  from  Jacobs  and  Vandeventer 
(1972).  For  each  of  the  12  rules,  (1)  an  example  was  presented  by  the  proctor  (using 
an  overhead  projector  and  a  portable  projection  screen),  following  which  (2)  the 
subjects  were  asked  to  solve  a  second  example  in  a  booklet.  Then,  (3)  the  proctor 
presented  the  solution  to  the  second  example,  again  using  the  overhead  projector. 
After  all  12  rules  had  been  demonstrated  by  the  proctor  and  attempted  by  the 
subjects,  3  examples  of  "rules  in  combination"  were  presented  to  show  how  seem¬ 
ingly  complex  problems  are  sometimes  merely  combinations  of  several  simple  ruies. 


Part  3:  Modeling.  The  third  part  of  the  dynamic  testing  procedure  was  embedded  in 
the  actual  test  session.  First,  subjects  used  the  ailoted  time  to  solve  each  of  the  36 
AFM  Set  II  items  (with  the  previously  described  item  time  limits),  after  which  the 
proctors  ensured  that  all  subjects  entered  and  recorded  their  answer  on  the  compu¬ 
ter.  This  was  to  prevent  subjects  from  changing  an  answer  once  time  had  expired. 
After  each  answer  was  recorded,  a  proctor  used  the  overhead  projector  to  demon¬ 
strate  how  the  problem  could  have  been  solved  to  obtain  the  correct  answer.  The 
problem  solutions  were  read  from  a  script.  Our  goal  was  to  model  successful,  rule 
governed  problem  solving  throughout  the  course  of  the  test. 

To  summarize  the  dynamic  testing  package,  Part  1  was  designed  to  encourage 
subjects  to  view  the  matrix  problems  analvticallv,  and  to  avoid  impulsive  answer 
choice  selection  by  generating  and  drawing  answers  before  examining  the  alterna¬ 
tives.  Part  2  was  designed  to  reinforce  the  concept  that  problems  are  rule-governed 
and  that  seemingly  complex  problems  can  (sometimes)  be  analyzed  in  terms  of 
combinations  of  simpler  rules,  some  of  which  were  presented  as  part  of  the  instruc- 
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tion.  Part  3  allowed  the  proctor  to  model,  on  a  continuing  basis,  a  successful  method 
of  problem  solving.  A  dynamic  test  session  typically  required  2.5  to  3  hours. 


TABLE  2 

Twelve  Item  Rules' 


Identical  pattern:  Every  ceil  is  exactly  the  same. 

Shading:  Progressive  change  in  shading. 

Movement  in  a  plane:  Figure  moves  as  if  slid  along  a  surface. 

Reversal:  Two  elements  exchange  some  feature,  such  as  size,  shading,  or  position. 

Addition:  The  figure  in  one  column  (row)  is  added  to  that  in  the  second,  and  the  result  is  placed  in  the  third. 
Number  series:  Constant  increase  in  items  across  cells. 

7.  Shape:  Complete  change  of  form,  or  systematic  change,  as  from  solid  to  dotted  lines. 

8.  Size:  Proportionate  change,  as  in  photographic  enlargement  or  reduction. 

Mirror  image:  Figure  moves  as  if  lifted  up  and  replaced  face  down. 

Added  element:  A  new  clement  is  introduced,  or  an  old  one  is  taken  away. 

Unique  addition:  Unique  elements  arc  treated  differently  from  common  elements,  e.g..  they  arc  added 
while  common  elemen's  cancel  each  other  out. 

12.  Three  of  a  kind:  Each  clement  appears  three  times  in  a  3  x  3  matrix. 


1. 

2. 

3. 

4. 

5. 

6. 


9. 

10. 

II. 


"Adapted  from  Jacobs  &  Vandcventer  ( 1972). 


RESULTS 

First,  we  calculated  a  £-score  for  each  individual  based  on  the  ten  ASVAB  tests,  to 
use  as  an  external  (or  criterion)  measure  of  g  when  examining  the  construct  validity 
of  the  Raven.  The  ASVAB-v;,  score  was  derived  by  performing  a  hierarchical  factor 
analysis  (orthogonalized  following  Schmid  and  Leiman  1957)  on  ASVAB  scores  from 
the  1988  fiscal  year  Navy  applicant  sample  (N  =  147,287).  The  loadings  of  the  10 
ASVAB  tests  on  the  hierarchical  factor  were  subsequently  used  as  weights  to  calcu¬ 
late  an  ASVAB-g  for  each  individual. 

Next,  comparison  statistics  were  computed  to  contrast  the  “dynamic '  (N  =  395) 
vs.  “control"  (N  =  413)  groups.  The  results  are  shown  in  Table  3,  where  it  can  be 
seen  that  the  two  groups  are  almost  identical  on  ASVAB-vj  scores  from  pre-enlistment 
test  sessions.  This  further  indicates  that  group  assignment  was  indeed  random. 
However,  the  APM  scores  differ  significantly  (f  =  —8.11,  p<  .001),  with  the  dynamic 
testing  subjects  obtaining  a  higher  mean  score.  This  finding  is  not  surprising  since 
dynamic  procedures  are  designed  to  be  helpful.  The  essential  question  is  whether 
a  "better '  test  emerges,  e.g.,  is  there  anv  gain  in  construct  validity? 

TABLE  3 

Comparison  of  Standard  and  Dynamic  Testing 

RAVEN  ASVAC-k  APM  x  y 


V 

Mean 

SD 

Alpha 

Mean 

SD 

Correlation 

Control 

413 

19  39 

mmm 

89 

59 

D>  namic 

395 

22. 00 

~7 

325.18 

57 

Ar-.-.i  ..,1 

1641 

i  /  .>  i 

W3Bm 

S3 

324  48 

>0  44 

56 
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TABLE  4 


Raven  (APM)  Scores  and  Validity  as  a  Function  of 
Aptitude  Group  and  Test  Condition 


CONTROL 

low.k 

HIGH* 

(N  =  102 1 

(N  ■■ 

=  103 ) 

APM:  X.SD  =  15.06.5.16 

APM:  X.SD 

=  24  55.4  76 

ASVAB-g:  X.SD  =  288  54.  11.49 

ASVAB-k:  X.SD 

=  364  58.  12.46 

APM  x  g correlation  =  .21. p<  .01 

APM  x  gcorrclation 

=  34. p  <  001 

DYNAMIC  TESTING 

LOW-s 

HIGH-k 

(N  =  113) 

IN  ■■ 

=  !0R) 

APM:  X.SD  =  19.35.4.49 

APM:  X.SD 

=  26.43.4.00 

ASVAB-i;:  X.SD  =  289.02.12.38 

ASVAB-.i<:  X.SD 

=  363.18.  II  43 

APM  x  gcorrclation  =  .27, p  <  .01 

APM  x  gcorrclation 

=  .27./*  <  .01 

ARCHIVAL 

LOW-x 

HIGH* 

IN  =  424) 

IN 

=  42V) 

APM:  X.SD  =  13.58.4  98 

APM:  X.SD 

=  22.06.4.47 

ASVAB-i;:  X.SD  =  285  19.  11.46 

ASVAB-#:  X.SD 

=  364.23.  12.18 

APM  x  g  correlation  =  .27.  p<  01 

APM  x  gcorrclation 

=  .24 ,p<  .01 

To  determine  whether  the  dynamic  testing  APM  score  would  be  an  improved 
measure  of  ^  correlations  between  APM  and  the  ASVAB-^  score  were  computed. 
The  correlation  between  APM  and  ASVAB-^  was  .59  (p<  .0001)  for  the  "control" 
group,  and  .57  (p<  .0001)  for  the  "dynamic  testing"  group.  The  difference  in  corre¬ 
lations  was  not  statistically  significant.  Finally,  the  internal  l  ur.aistcncv  of  the  APM 
was  assessed  via  Cronbach's  Alpha,  yielding  reliabilities  of  .84  and  .77  for  scores 
obtained  under  standard  and  dynamic  procedures,  respectively.  These  values  are 
significantly  different  (F[41 2,394]  =  1 .438,  p  <  .01).  Nothing  in  these  results  suggests 
that  dynamic  procedures  enhanced  the  precision  of  the  APM  test. 

Also  shown  in  Table  3  is  "archival"  data  from  some  of  our  various  research  studies 
of  the  last  three  years.  The  subjects  were  comparable  to  those  in  the  present  study 
(i.e. ,  Navv  men  of  about  the  same  age).  The  scores  were  obtained  with  40  minute 
self-paced  test  sessions,  however,  rather  than  the  current  group-paced  procedure 
wherein  time  limits  were  set  per  item.  The  archival  data  was  thus  collected  under 
procedures  specified  in  the  APM  manual,  without  anv  experimental  intervention 
whatsoever,  it  can  be  seen  that  the  failure  of  dynamic  procedures  to  improve  relia¬ 
bility  or  construct  validity  is  not  a  function  of  our  control  group,  since  a  comparison 
of  dynamic  testing  data  with  archival  data  leads  to  similiar  conclusions. 

In  the  overview  we  noted  that  the  construct-irrelevant  strategies  targeted  by  our 
dynamic  testing  package  are  most  often  exhibited  bv  "lower  ability  subjects.  There¬ 
fore.  the  greatest  treatment  ef  fect  (and  benefits)  should  be  found  in  the  lower  ranges. 
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To  examine  whether  this  was  indeed  the  case,  we  constructed  two  subsamples, 
comprised  of  subjects  in  the  upper  and  lower  quartiles  of  the  full  sample  distribution 
of  ASVAB-^  scores.  Treatment  effects  were  then  examined  separately  for  the  high-y 
and  low-y  groups.  Table  4  displays  the  within-group  correlations  between  the  APM 
and  the  ASVAB-y.  (Recall  that  these  correlations  reflect  construct  validity).  As  the 
table  shows,  dynamic  testing  did  not  affect  construct  validity  in  the  low  ability 
subjects  (.27  in  both  conditions).  Rather,  if  any  validity  change  did  occur,  it  was 
for  high  ability  subjects  and  in  the  wrong  direction  (.34  vs  .27). 


TABLE  5 


Rules  Governing  the  APM  Items,  and  Treatment  Effects  for  Items 


Raven 

Rules' ‘ 

Effect  of  Dynamic 

Training  Ip  <  .05) 

1 

6.12 

- 

2 

6 

- 

3 

3 

- 

4 

2 

+ 

5 

2 

4- 

6 

6 

NS 

7 

5 

4- 

8 

12 

NS 

9 

5 

4* 

10 

3.8 

4* 

1  1 

5 

* 

12 

II 

4- 

13 

6.12 

NS 

14 

3 

NS 

15 

5 

NS 

16 

11 

4- 

17 

12 

-4 

18 

» 

NS 

19 

5 

4- 

20 

5.’ 

* 

21 

2.3.8.12 

- 

22 

11 

4- 

23 

1 1 

4- 

24 

1 

4* 

25 

2.5 

-- 

26 

3.12 

NS 

27 

10.12 

NS 

28 

6.12 

NS 

29 

3.8.12 

4- 

30 

•» 

NS 

31 

3.12 

NS 

32 

3 

NS 

33 

5.11 

NS 

34 

6.12 

- 

35 

5.11 

NS 

36 

5.11 

NS 

JFrom  Table 
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Also,  it  must  be  noted  that  the  untrained  "high  g"  group  in  Table  4  outperformed 
the  fully  trained  "low  g”  group  ( t  [1,2141  =  -8.26,  p<.001),  suggesting  that  the 
APM  is  measunng  a  genuine  intellectual  trait  on  which  relatively  stable  individual 
differences  exist.  The  archival  data  supports  these  conclusions. 

Finally,  we  constructed  Table  5  to  show  how  the  12  rules  presented  in  training 
applv  to  the  36  APM  items,  and  also  to  clarify  which  items  were  significantly 
enhanced  by  the  dynamic  procedures.  (Item/rule  relationships  were  independently 
determined  by  two  of  the  authors,  with  a  third  author  resolving  disputes).  Inspection 
of  the  table  shows  that,  as  expected,  multiple  rules  are  more  common  on  the  later 
(harder)  APM  test  items.  All  significant  treatment  effects  in  Table  5  indicate  that 
the  dynamic  testing  group  had  a  higher  proportion  correct  than  control  subjects, 
except  for  the  first  three  items,  where  dynamic  testing  was  associated  with  a  signifi¬ 
cant  drop  in  performance.  From  examining  the  table,  it  is  apparent  that  most  of 
the  score  gain  from  dynamic  procedures  came  in  the  middle  portion  of  the  test. 


DISCUSSION 

While  dynamic  testing  is  usually  seen  as  supporting  the  construct  validity  of  tests, 
it's  usefulness  may  have  limits,  as  in  the  case  of  a  general  intelligence  test.  One 
reason  is  that  previous  studies  (Haygood  &  Johnson  1983;  Ippel  &  Beem  1987)  link 
g  to  the  kind  of  strategic  variation  that  might  be  diminished  by  dynamic  procedures. 
Conceivably,  dynamic  testing  might  actually  reduce  validity.  Our  results,  however, 
show  little  harm  from  dynamic  administration  of  Raven's  Advanced  Progressive 
Matrices.  Since  no  psychometric  benefit  was  obtained  either,  there  seems  little  reason 
to  undertake  time-consuming  dynamic  procedures  for  a  test  like  the  APM. 

The  significant  raw  score  gain  following  training  highlights  the  importance  of 
testing  all  examinees  under  the  same  conditions  when  scores  will  be  used  operation¬ 
ally.  However,  despite  significant  score  gains  following  training,  evidence  for  the 
stability  of  general  intelligence  scores  was  also  obtained.  That  is,  the  untrained  "high 
ASVAB-y"  group  outperformed  the  fully-trained  "low  ASVAB-y"  group  on  a  second 
measure  of  g  (the  APM).  Even  several  hours  of  training  and  problem  solving  demon¬ 
strations  were  insufficient  to  allow  Iow-y  subjects  to  perform  like  high-y  subjects. 
This  suggests  that  the  Raven  scores  reflect  somewhat  unmalleabie  qualities  such  as 
induction  and  working  memory  capacity  (see  Carpenter,  Just,  &  Shell  1990)  rather 
than  strategic  differences.  But  if  so,  then  why  do  Raven  scores  correlate  with  strategic 
differences  on  other  tasks  like  memory  scanning  and  mentai  rotation?  One  possible 
explanation  is  that  if  the  Raven  is,  indeed,  measunng  something  like  working 
memory  capacity,  then  subjects  with  high  capacity  may  have  greater  "reserves  '  or 
spare  capacity  for  self-momtonng  on  laboratory  tasks,  allowing  the  discovery'  of 
efficient  strategies.  This  theory  would  predict  that  strategic  differences  would  not 
be  so  apparent  if  all  sublets  were  working  at  maximum  capacity. 

One  possible  criticism  of  the  present  study  is  that  the  ASVAB  is  not  necessanly 
a  good  measure  of  g  and  that  it  is  therefore  an  inadequate  entenon  by  which  to 
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judge  the  construct  validity  of  the  Raven.  In  response,  we  turn  to  conventional 
wisdom.  Convention  suggests  that  when  a  10  test  aptitude  battery  with  fair  content 
diversity  (the  ASVAB)  is  subjected  to  a  hierarchical  factor  analysis,  the  first  hier¬ 
archical  factor  is  a  satisfactory  estimate  of  g.  We  feel  that  the  relationship  between 
this  hierarchical  g  and  the  Raven  is  a  fair  and  reasonable  index  of  the  degree  to 
which  the  Raven  measures  general  intelligence.  We  interpret  changes  in  this  relation¬ 
ship  following  training  as  changes  in  the  effectiveness  of  the  Raven. 

We  do  not  see  our  results  as  a  general  indictment  of  dynamic  procedures.  Rather, 
we  are  primarily  concerned  that  if  the  Limits  of  these  procedures  must  be  addressed. 
To  simply  "boost"  a  score  through  dynamic  procedures  serves  no  useful  purpose 
unless  the  result  is  more  reliable  and/or  valid. 

ACKNOWLEDGMENT:  The  opinions  expressed  in  this  article  are  those  of  the  authors,  are 
not  official  and  do  not  necessarily  reflect  the  views  of  the  Navy  Department. 


NOTES 

1.  When  test  data  are  subjected  to  either  hierarchical  factoring  or  multidimensional 
scaling,  reasoning  tests  such  as  Raven's  Progressive  Matnces  a*e  the  best  markers  for  the 
general  variance  in  the  battery  (Marshalek,  Lohman,  &  Snow  1983).  Also,  L1SREL  analyses 
indicate  that  reasoning  tests  are  excellent  measures  of  g  (Undheim  &  Gustafsson,  1987). 

2.  Considerably  smaller  numbers  of  subjects  underwent  "partial  dynamic  testing." 
Since  the  N's  are  smaller  and  the  results  are  consistent  with  those  already  being  reported, 
there  is  little  to  be  gained  by  presenting  the  partial  treatment  data. 

3.  While  these  studies  all  involved  children,  it  is  quite  likely  that  a  dimension  like  impul- 
sivitv  affects  the  "encoding  time"  parameter  in  performance  models  on  adult  subjects.  For 
example.  Sternberg  (1977)  found  that  adult  subjects  who  are  relatively  poor  at  solving  analo¬ 
gies  also  tend  to  spend  less  time  encoding  the  items.  Indeed,  a  "deep  encoding"  style  can  be 
said  to  be  a  characteristic  of  highly  skilled  (or  expert)  performance  in  general  (Nickerson  1988). 
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